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Pointwise Relations between Information and 
Estimation in Gaussian Noise 

Kartik Venkat*, Tsachy Weissman* 
Abstract 

Many of the classical and recent relations between information and estimation in the presence of Gaussian noise 
can be viewed as identities between expectations of random quantities. These include the I-MMSE relationship of 
Guo et al.; the relative entropy and mismatched estimation relationship of Verdu; the relationship between causal 
estimation and mutual information of Duncan, and its extension to the presence of feedback by Kadota et al.; the 
relationship between causal and non-casual estimation of Guo et al., and its mismatched version of Weissman. We 
dispense with the expectations and explore the nature of the pointwise relations between the respective random 
quantities. The pointwise relations that we find are as succinctly stated as - and give considerable insight into - the 
original expectation identities. 

As an illustration of our results, consider Duncan's 1970 discovery that the mutual information is equal to the 
causal MMSE in the AWGN channel, which can equivalently be expressed saying that the difference between the 
input-output information density and half the causal estimation error is a zero mean random variable (regardless of the 
distribution of the channel input). We characterize this random variable explicitly, rather than merely its expectation. 
Classical estimation and information theoretic quantities emerge with new and surprising roles. For example, the 
variance of this random variable turns out to be given by the causal MMSE (which, in turn, is equal to the mutual 
information by Duncan's result). 

Index Terms 

Mutual information, minimum mean square error, Brownian motion, information density, Gaussian channel, 
causal/filtering error, non-causal/smoothing error, Radon-Nikodym derivative, Girsanov theory, Ito calculus, 
scalar estimation, continuous time estimation 

I. Introduction 

The literature abounds with results that relate classical quantities in information and estimation theory. Of 
particular elegance are relations that have been established in the presence of additive Gaussian noise. In this 
work, we refine and deepen our understanding of these relations by exploring their 'pointwise' properties. 

Duncan, in |[TJ, showed that for the continuous-time additive white Gaussian noise channel, the minimum mean 
squared filtering(causal estimation) error is twice the input-output mutual information for any underlying signal 
distribution. Another discovery was made by Guo et al. in |2|, where the derivative of the mutual information was 

* Stanford University. Email: kvenkat@stanford.edu, tsachy@stanford.edu 



2 



found to equal half the minimum mean squared error in non-causal estimation. By combining these two intriguing 
results, the authors of [2] also establish the remarkable equality of the causal mean squared error (at some 'signal to 
noise' level snr) and the non-causal error averaged over 'signal to noise' ratio uniformly distributed between and 
snr. There have been extensions of these results to the presence of mismatch. In this case, the relative entropy and 
the difference of the mismatched and matched mean squared errors are bridged together: Mismatched estimation in 
the scalar Gaussian channel was considered by Verdii in In |j4j, a generalization of Duncan's result to incorporate 
mismatch for the full generality of continuous time processes is provided. In |5 |, Kadota et al. generahze Duncan's 
theorem to the presence of feedback. These, and similar interconnections between information and estimation are 
quite intriguing, and merit further study of their inner workings, which is the goal of this paper . 

The basic information-estimation identities, such as the ones mentioned above, can be formulated as expectation 
identities. We explicitly characterize the random quantities involved in a pointwise sense, and in the process elicit 
new connections between information and estimation for the Gaussian channel. Girsanov theory and Ito calculus 
provide us with tools to understand the pointwise behavior of these random quantities, and to explore their properties. 

The paper is organized as follows. In Section |ll] we present and discuss our main results. In Section |in] we 
further develop and expand our results and observations for the setting of scalar random variables. The detailed 
proofs are provided in Section IV We conclude in Section |V] with a summary of our main findings. 



II. Main Results 

A. Scalar Estimation 

1 ) Matched Case: We begin by describing the problem setting. We are looking at mean square estimation in the 
presence of additive white Gaussian noise. This problem is characterized by an underlying clean signal X (which 
follows a law Px) and its AWGN corrupted version Yy measured at a given 'signal-to-noise ratio' 7, which is to 
say 

Yy\X ^^i^X,l), (1) 

or equivalently 

Y,\X ^U{jX,j), (2) 

where Af{fi,a'^) denotes the Gaussian distribution with mean /i and variance a^. 

In a communication setting, we are interested in the mutual information between the input X and the output Yj. 
It quantifies the ability of the channel to convey useful information. In an estimation setting, one would be interested 
in using the observed output to estimate the underlying input signal while minimizing a given loss function. Define 
mmse(7) to be the minimum mean square error at 'signal-to-noise ratio' 7 



mmse(7) = E (X - E[X\Yy]) 



(3) 



Intriguing ties have been discovered between the input-output mutual information and the mean squared estimation 
loss for the Gaussian channel. In |2|, Guo et. al. discovered the I-MMSE relationship; which tells us that for the 
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additive Gaussian channel, the following relationship holds between the minimum mean square error and the mutual 
information between input X and output Fsm (the subscript making the 'signal-to-noise ratio' explicit) 



-^I{X;Ysnr) = ;^mmse(sm-). 
asnr 2 



Writing d?]) in its integral form. 



mmse(7) ^7. 



Recall that we can express the mutual information between two random variables as 



nX;Y) = E 



loe 



Y\X 



dPy 



(4) 



(5) 



(6) 



where the quantity in the brackets denotes the log Radon-Nikodym derivative of the measure induced by the 
conditional law of Y\X with respect to the measure induced by the law of Y. This quantity is referred to in some 
parts of the literature as the input-output information density i{X, Y) (cf. [6]). In particular, let us look at the 
following additive Gaussian channel at 'signal-to-noise ratio' 7, 



7' 



(7) 



for 7 e [0, snr], where W. is a standard Brownian motion |7|, independent of X. Recall that ^ A/^(0, 7). 

Now that (X, Fq") are on the same probability space (where throughout Yq"''' is shorthand for {F^jO < 7 < 
snr}), it is meaningful to interchange the expectation and integration in the right hand side of (jsj) to yield the 
following equivalent representation of the I-MMSE result 



log- 



Y,r„\X 



dP^ 



- {X-E[X\Y,]rdj 



In other words, the I-MMSE relationship can be restated succinctly as: 

E[Z] = 0, 

where 



Z = loe 



dP, 



- {X-E[X\Y,]rd^ 



(8) 



(9) 



(10) 



dPY^ 2 

denotes the "tracking error between the information density and half the squared error integrated over snr". But what 
can we say about the random variable Z itself, beyond the fact that it has zero mean? Is there a crisp characterization 
of this random variable? The answer is captured in the following Proposition, where we present our first pointwise 
result. 

Proposition 1: Assume X has finite variance. Z, as defined in ( [T0| , satisfies 



Z = 



{X-E[X\Y^]) ■ dW^ 



(11) 



where the integral on the right hand side of (111 denotes the Ito integral with respect to W. . 
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In particular, the above characterization impHes that Z is a martingale, and (by virtue of having zero expectation) 
directly implies the I-MMSE relationship in (|9]l (which is equivalent to (j5])). Another immediate consequence of 
Proposition [T] is the following: 

Theorem 1: Assume X has finite variance. Then 

/"Snr 

Var{Z)= mmse{j)dj = 2I{X;Y,^). (12) 
Jo 

Thus we observe a simple characterization of the second moment of the tracking error, in terms of classical estimation 
and information quantities. The relationship in ([12]) tells us how far apart the information density and the estimation 
error typically are, two quantities that we know to have equal expectations - and in particular that the variance of 
their difference can be described directly in terms of the original estimation error 

2) Mismatched Case: We now turn to the scenario of mismatched estimation, where the underlying clean signal 
X is distributed according to P, while the decoder believes the law to be Q. |j3) presents the following relationship 
between the relative entropy of the true and mismatched output laws, and the difference between the mismatched 
and matched estimation losses: 

L'(P*AA(0,l/snr)||Q*AA(0,l/snr)) = - / msep^gi-/) - msep^pi-/) d-/, (13) 



2 _ ^, 

where * denotes the convolution operation, and msep,Q{'y) is defined as 

msep,Q(7) = Ep[{X - EQ[X\Y^]f]. (14) 



Towards deriving a pointwise extension of (13 i, we note that it can be recast, assuming again the observation model 
in (|7]i, as the expectation identity 

dPy 



loe 



-J^ {X-EQ[X\Y,]f^{X~Ep[X\Y,]rdj 



(15) 



(16) 



Let Zm denote the difference between the random quantities appearing in the above expression, i.e. 

dP 1 Z"™' 

Zm =log-^-- {X~EQ[X\Y,]r-{X-Ep[X\Y,]rdr 

In the following, we provide an explicit characterization of this random variable. 

Proposition 2: Assuming X has finite variance under both P and Q, Zm defined in ( [T6] l, satisfies 

/•snr 

Zm = / (Ep[X|K,] - Eq[X\Y^]) ■ dW^ P- a.s. (17) 
Jo 

We observe that the above Ito integral is a martingale and consequently has zero expectation E[Zm] — 0, recovering 
(15 I, i.e.Verdu's relation from |3|. A further implication that can be read off of Proposition |2] rather immediately 
(as will be explicitly shown in the Section [rvj i is the following: 

Theorem 2: Assuming X has finite variance under both P and Q, Zm defined in (16i, satisfies 

/•snr 

Var(ZAf) = / msep,Q(7)-TOsep,_p(7)d7 = 2D(P*7V(0, l/snr)||g*A/'(0, 1/snr)). (18) 
Jo 
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Similarly as in the non-mismatched case, we observe that the variance of the difference between the information and 
estimation theoretic random variables whose expectations comprise the respective two sides of Verdu's mismatch 
relationship has a distribution independent characterization in terms of the matched and mismatched estimation 
errors and consequently, by yet another application of this same relationship of Verdu, in terms of the relative 
entropy between the output distributions. In the following subsection we extend this line of inquiry and results from 
the scalar case to that where the channel input is a continuous-time process. 



B. Continuous Time 

We now turn to the continuous -time Gaussian channel. Let Xq be the underlying noise-free process (with finite 
power) to be estimated. The continuous time channel is characterized by the following relationship between the 
input and output processes, 

dYt = Xtdt+ dWt, (19) 

where {Wf}t>o is a standard Brownian motion, independent of Xq . 

1 ) "Pointwise Duncan ": In fTl, Duncan proved the equivalence of input-output mutual information to the filtering 



squared error, of a finite powered continuous time signal Xt, corrupted according to ( 19 1 to yield the noise corrupted 
process Yt. The signal is observed for a time duration [0,T]. Denoting the time averaged filtering squared error, 

cmmse(r) = / E[{Xt - E[Xt\Y^]f]dt (20) 

and letting I{X'^;Y'^) denote the input-output mutual information, Duncan's theorem then tells us that, 

I{X'^;Y'^) ^^cmmse{T). (21) 

In (5), Kadota et al. extend this result to communication over channels with feedback, and in the recent (8) this result 
is extended to more general scenarios involving the presence of feedback, and it is shown that (21 1 remains true in 



these more general cases upon replacing the mutual information on the left hand side with directed information. In 
Q, several properties of likelihood ratios and their relationships with estimation error are studied. We now proceed 
to describe a pointwise characterization of Duncan's theorem. 
Considering the random variable D{T) defined as 

D(T) = log ^I^rp^ _ 1 - E[X,\Y']ydt, (22) 

U-fy'^ ^ Jo 

Duncan's theorem is equivalently expressed as 

E[D{T)] = 0. (23) 
We now present an explicit formula for D{T) in the following Proposition. 



Proposition 3: Let D{T) be as defined in (22 1. Then, 



D{T)= I [Xt - E[Xt\Y'])dWt a.s.. (24) 
Jo 
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Note that on the the right side of (24 1 is a stochastic integral with respect to the Brownian motion W. driving the 
noise in the channel. With this representation, Duncan's theorem follows from the mere fact that this stochastic 
integral is a martingale and, in particular, has zero expectation. 

On applying another basic property of the stochastic integral we get the following interesting result for the 
variance of D{T). 



Theorem 3: For a continuous -time signal with finite power, D{T) as defined in (22i satisfies 

Var{D{T)) = cmmse(r). (25) 
In conjunction with Duncan's theorem pT| ), we get the following relationship, 

Var{D{T)) = cmmse(r) = 2I{X^; Y^) (26) 



which parallels our discovery for scalar random variables, in ( 12 1. Thus, we find that the pointwise tracking error 
satisfies this intriguing distribution independent property, for the full generality of continuous time inputs for the 
Gaussian Channel. That the estimation error and mutual information emerge from this analysis in such a crisp 
manner is quite satisfying. 



Remark 1: One can note that for Xt = X m the interval [0,T], we can use the results in (25i and (24i, to 
recover Theorem [T] and its its pointwise characterization in Proposition [T] respectively. 



Among the additional immediate benefits the characterization in ( |24| ), is that it allows us to infer facts about the 
Umiting behavior of the random variables involved, such as in the following theorem. 
Theorem 4: Suppose that the process {Xf}(>o satisfies 

lim -Irrcmmsefr) = (27) 

(or, equivalently, by Duncan's theorem, lim^^oo (-'^^"^; = ). Then, 

^Py^\x7. 1 r , 



l.i.m.T^oo;^ 



log - o / - mt\Y^]f dt 



= 0, (28) 



where l.i.m. denotes Limit in the Mean. 

We already know from Duncan's theorem that the two quantities that make up D{T), namely the information 
density and the causal estimation error, are equal in expectation for every T > 0, but our formulation reveals much 
more about the pointwise behavior of these random quantities in themselves. In particular, it is interesting to note 



that so little is needed to guarantee the convergence in (28 1: not even wide sense stationarity of the marginal of 
the underlying process is required. Indeed, any process with Var(Xf) growing sublinearly with t is easily seen to 



satisfy (27 1. 

2) Pointwise Mismatch: We now consider the setting in Q, where a continuous time signal Xt, distributed 
according to a law P is observed through additive Gaussian noise, and is estimated by an estimator that would have 
been optimal if the signal had followed the law Q. In this general setting , the main result in Q shows that the 
relative entropy between the laws of the output for the two different underlying distributions (P and Q), is exactly 
half the difference between the mismatched and matched filtering errors. Let Yt be the continuous time AWGN 
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corrupted version of Xt as given by ( 19 1. Let PyT and Qy^ be the output distributions when the underlying signal 
Xq has law P and Q, respectively. As before, T denotes the time duration for which the process is observed. We 
denote the mismatched causal mean squared error, 

i-T 



cmsep,Q{T) = / Ep[{Xt - EQ[Xt\Y'])^] dt. 

JQ 



(29) 



In this setting, f4] tells us that the relative entropy between the output distributions is half the difference between 
the mismatched and matched filtering errors, i.e. 

1, 



DiPYTWQyT] 



-[cmsep,Q{T) - cmsep^p{T)] 



(30) 



Define the pointwise difference between the log Radon-Nikodym derivative and half the mismatched causal squared 
error difference. 



M{T) = log 



dPyT 
dQyT 



{EQ[X,\Y']~X,f^{Ep[XAY'] 



X,? 



dt. 



Note that according to the above definition, ( 30 1 can be equivalently stated as 

E[M(T)] = 0. 

But in fact much more can be said about M{T): 
Proposition 4: 



(31) 



(32) 



M{T) 



I {£p[Xt\Y']~£Q[Xt\Y'])dWt P-a.. 

JQ 



(33) 



where M{T) is as defined in (31 1, and Xt is assumed to have finite power under the laws P as well as Q. 
We note that relation (30 1 is implied immediately by Proposition [4] due to the 'zero mean' property of the martingale 
AI (T). But more can be read off of this result. For example, the following characterization of the variance of M{T) 
will be shown in Section IV to follow quite directly from Proposition]?] 



Theorem 5: M(T) as defined in (30 1, satisfies 



Var{M{T)) = cmsep^qiT) - cmsep,p{T) = 2D{PyT\\QyT). 



(34) 



Thus, the variance of M{T) is exactly the difference between the causal mismatched and matched squared errors. 
And further, from |4 | we know that it is equal to twice the relative entropy between the output distributions according 
to laws P and Q. 

3) Presence of Feedback: In the previous subsections, we explicitly characterized a pointwise relationship 
between the log Radon-Nikodym derivates associated with the informational quantities and the squared filtering 
error These characterizations give us a crisp understanding of well known information-estimation results such 
as Duncan's theorem |]T], and the equivalence between mismatched estimation and relative entropy Q for the 
continuous time setting. These results emerge as direct corollaries of our characterization of the tracking errors as 
stochastic integrals. Further, we establish a new equivalence between estimation error and variance of the tracking 
error 
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In this section we revisit ||5), where Kadota et al. present a generahzation of Duncan's theorem for the additive 
Gaussian channel with feedback. The channel input 0( is a function of the underlying process Xt as well as the 
past outputs of the channel Y^, in an additive Gaussian noise setting. The observation window is i e [0,r]. The 
channel can be represented as, 

Yt^ I ^s{Y^,Xs)ds + Wu (35) 



where, as usual, the standard Brownian motion W. is independent of the underlying process Xi^.y In differential 



form, (and using shorthand to represent (ptiXo t Xt)) we can rewrite (35i as 

dYt = 4>fdt+ dWt (36) 
We denote the causal estimate of (f>t based on observations up until t by 

$t = E[^t\Y^]. (37) 

Under mild regularity conditions on <j)t, the mutual information between the input and output is equal to half the 
causal mean squared error With our notation, the main result of Q is expressed as 

I{X^;Y;f)^^J^ EUt-^t)^]dt. (38) 

We use Girsanov theory to develop a pointwise relationship in this setting, akin to our treatment of Duncan's 
theorem. Defining 

A dPYTIxT 1 /"^ - o 

^^^^^ dPyT "2/0 ~ ^^'^^ 

We have the following: 

Theorem 6: For which satisfy a finite power criterion, we have 

D^iT) = [ {cbt-$t)dWt a.s., (40) 



where D^{T) is as defined in (39 1. 

Parallel to our discovery in the pointwise treatment of Duncan's theorem, we can use Theorem |6] to deduce 
various results. Note from (40i, that D^{T) is a martingale. Therefore, 

E.[D^{T)] = 0, (41) 

recovering the main result of ||5|, namely (38i. Using Ito's Isometry we also immediately obtain: 
Corollary 7: 

Var{D^{T)) = [ E[iq^t - ^t)^] dt (42) 
Jo 

= cmmse0(T) (43) 



The above follows directly from the application of Ito's Isometry property to the stochastic integral in (40 1 and 
noting that E[D^{T)] = 0. 
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Thus, even for the generalized setting of communication over channels with feedback, we can characterize how 
closely the information density and squared filtering error track each other We note that the second moment of 
the tracking error is equal to the filtering error for all finite powered distributions of the underlying signal. In 
particular, these results may have applications in approximating the mutual information via estimation theoretic 
quantities, for channels with feedback. In the special case when 0f = Xt, we recover the results obtained in the 
pointwise treatment of Duncan's theorem in Section II-Bl Here, we would like to note that Theorem [6] can further 
be extended to accommodate mismatch. 

Let us denote separately, the Causal Estimates of (j>t under the two laws P and Q that govern the underlying 
process Xt'. 



Define, 



A dPyl 



if- 



dt. 



As will be shown in the next section, Girsanov theory allows us to establish, for this setting 

1 



where 



Define, 



erase 



P-.Q 



(T) 



Ep[(0t-0?)2](ii. 



M4T) = log 



dPyT 
dQyT 



dt. 



(44) 
(45) 

(46) 

(47) 
(48) 
(49) 



The arguments given in the proof of Proposition |4] and the treatment of the non-mismatched case above, can be 
carried over to show the following: 



Theorem 8: For a finite power constraint on 0t under laws P and Q, M^{T) as defined in (49i satisfies 



\dWf P 



(50) 



Note that (47i follows directly from Theorem |8] by noting that M^{T) is a martingale and consequently has zero 
mean. Thus, the generalized D-MSE relationship for channels with feedback is an expectation identity that arises 
from the pointwise treatment of the tracking error in ( [49] l. As a corollary, we also obtain the generalized result for 
the second moment of the tracking error, which acts as a bridge between the relative entropy and the difference of 
the mismatched and matched filtering errors. 
Corollary 9: 



Var(M0(T)) = cmse%Q{T) - cmse%p{T) = 2D{Pyt\\Qyt) 



(51) 
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The result in (51 1 can be specialized to obtain results similar to those obtained previously for "Pointwise Duncan", 
and pointwise mismatch in the absence of feedback. 

4) Pointwise I-MMSE for processes: In Guo et al. present what is known as the I-MMSE relationship for the 
Gaussian channel. This result states that the mutual information (at signal-to-noise ratio level 'snr') is the integral 
over SNR (from level to 'snr') of half the non-causal squared error 

In this subsection, we present a characterization of the pointwise nature of the I-MMSE relationship for processes 
in the continuous time Gaussian channel. We first explain the channel model. Since we are now concerned with 
two continuously varying parameters, namely time and 'snr', the Gaussian noise corrupting the signal is a standard 
Brownian sheet Wt,-y For a fixed 7, we let PK'-^'' denote the Brownian motion defined by W^/^' — Wt^-y The 



channel then, at SNR 7, is given by 



dr/'^^ = jXt dt + dWt^'''' , (52) 



where {Xt}o<t<T is the underlying noise free process, which is independent of the Brownian sheet. The output 
of the channel at SNR 7, is denoted by Fq^'*'^^ — {l^/^^}o<t<T- In this framework, the I-MMSE relationship from 
||2) tells us that, 

I{Xl- Y^'^'""^) = - mmse(7) dj, (53) 



where 



mmse 7 = 



(7)= r E[iX,-E[Xt\Y,^'^^^]r]dt. (54) 
"'0 



Note that ([53]l can now equivalently be stated as , 



Defining the pointwise difference between the input-output information density and half the non-causal error 
integrated over time and SNR 

N{T) ^ log 7° - - / / (X, - E[Xi|if dtdj, (56) 

J n 





■1 r 


= E 






■2 70 Jo 



T 



(55) 



we first note that the I-MMSE relationship in ( 55 1 can equivalently be stated as 

E[iV(r)] = 0. (57) 

In doing so, we formulate the I-MMSE relationship as an expectation identity. In the previous discussions, we 
observed an estimation theoretic flavor to the second moment of the tracking error in characterizations of both 
Duncan's result and that of mismatched estimation. This kind of a relationship turns out to hold also in our present 
context: 

Theorem 10: For a finite power continuous-time process X'^, N{T) as defined in (56i satisfies 

/>snr 

Var{N{T)) - / mmse(7) dj = 2I{Xl- if ^^^"")). (58) 
^0 
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In establishing the above result in the Section IV we use a multidimensional version of Girsanov's theorem to 
characterize the input-output information density of a piecewise constant process observed through AWGN. We 
then use approximation arguments to extend the result to the class of finite power continuous time processes. 

5) Pointwise causal vs. non-causal error: By a combination of Duncan's theorem and the I-MMSE result, the 
authors of Q establish the equivalence of the causal error at SNR level 'snr', and the non-causal error averaged 
over SNR uniformly distributed between and 'snr'. The input-output mutual information acts as a bridge between 



the quantities. Let 'cmmse(snr)' denote the integral of the filtering error for the channel described in (52i, 

cmmse( 



(snr) = / E[(X, - E[X,|{yi-')}o<.<t])2] dt. 
Jo 



(59) 



Recalling the definition of the non-causal error 'mmse(7)' in (54i, the causal vs. non-causal error relationship is 

. . 1 



cmmse snr 



snr 



mmse(7) dj. 



(60) 







So far, we have presented pointwise characterizations of Duncan's result in Section |II-B1| as well as the I-MMSE 

we 



relationship for continuous time processes in Section II-B4 Using these two characterizations, in Section IV 



develop a pointwise version of the celebrated estimation-theoretic result (60 1. Specifically, we show that under the 
Brownian sheet induced channel described in (52 1, and under some regularity assumed on the process X'^, the 
difference 



/■T -1 fsnr /'T 

/ (X, - i?[X,|rJ'(^"'-)])2dt - ^ / / (X, - E[Xt\Y^ 
Jo '"'''^^ Jo Jo 



d'y 



(61) 



can be characterized as a difference between stochastic integrals. In particular, such a characterization immediately 



implies (60 1. 

6) Pointwise causal vs. anticausal error: Now, we present another interesting application of Proposition [3] 
a pointwise treatment of the causal vs. anti-causal estimation error relationship. Duncan's theorem gives us the 
remarkable equality between the causal squared error and input-output mutual information for the continuous-time 
Gaussian channel. Invoking the invariance of mutual information to the direction of time, it can be observed (as 
is noted in I^J) that the causal squared error is equal to the anticausal squared error (for a given 'snr'), regardless 
of the input distribution of the underlying process. Let Xt be the noise free stochastic process that is distributed 



according to law R Let Yt be the continuous time AWGN corrupted version of Xt at snr = 1, according to ( 19 1. 
Let the observation window be t G [0,r]. 



Let us now denote 



Yt 
Bt 



Yt - Yr-t, 
Wt — Wr-t- 



(62) 
(63) 
(64) 



Note that the anti-causal estimation error for the original processes {X^ , F^) is given by the the causal estimation 
error associated with these "tilded" processes (X^, Y"^), i.e. : 



E 



{Xt-E[Xt\Y^] 



dt = 



E 



{Xt-E[Xt\Y,^]f 



dt. 



(65) 
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Define the difference of the causal and anti-causal estimation errors 



J{T) ^ f {Xt- E[Xt\Y^]f dt- f (Xt - E[Xt\Y^]f dt (66) 
Jo Jo 

Then, we have the following pointwise result relating the difference of the causal and anti-causal errors 
Proposition 5: For a process X"^ with finite power, J{T) as defined in (66 1 satisfies 



Ij{T) = [ E[Xt\Y^]-dWt- [ E[Xt\Y*]-dBt a.s. (67) 
^ Jo Jo 



We note that the right hand side of equation ( |67] i is the difference of two martingales. Taking expectation on both 
sides, we recover the equality of causal and anti-causal squared error. 



[ E\{X^-E[X,\Y^]fdt\ = f E\{Xt-E[Xt\Y,^]) 
Jo '- ^ Jo 



dt, (68) 



using merely the fact that the Ito integrals on the right hand side of (67i have zero mean. Thus, we provide 
a pointwise characterization of the causal vs. anti-causal errors. Duncan's theorem implies, what is otherwise a 
surprising result, that the causal and anti-causal squared errors are equal, as rederived in ( [68] l. Through ^T\ , we 
uncover the structure of and dependence between the random quantities involved and characterize their difference 
as a difference of two zero mean stochastic integrals. 

in. Scalar Setting: Examples, alternative Couplings, further Observations, and Identities 



Returning to the scalar channel setting of Subsection |II-A| we introduce notation as follows: 
Definition 1: 

h = log (69) 

/•snr 



1 r 

^2 = 2 {X-£[X\Y,]fd^ 



(70) 

Z = h~h, (71) 
where Z, as in the previous section, is informally referred to as the "tracking error" 

A. The Original Coupling 

We studied the example of the scalar Gaussian channel corrupted by additive Gaussian noise where the additive 
noise components for the different SNR levels were coupled via a standard Brownian motion, as in (j7|i. We 
characterized explicitly in Proposition [T] the tracking error Z between the information density and half the estimation 
error. 

To illustrate how explicit this characterization allows us to be, let us consider the case of X ^ A/^(0, 1), and use 
Proposition [T] to characterize the distribution of the random variable Z. Note that in this case. 
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Combining with (111, we have 



7+1 



a.s. 



Also, invoking the result in Theorem [T] we obtain the variance of Z to be 

Var(Z) - 2/(X;y3nr) 
= log(l 



snr 



(73) 

(74) 
(75) 



We shall now look at the pointwise scalar estimation problem in a new light. Recall that in moving from Q to 
we place all the random variables {X, FJ™ ) on the same probability space, via a standard Brownian motion, 
as in (j?]). Note, however, that the only assumption for the original results that hold in expectation is that, for each 
7 > 0, the channel satisfies ([l]|, i.e. 



(76) 



where JV{fi, cr^) denotes the Gaussian distribution with mean /i and variance cr^. Taking the channel noise variables 
for the various SNR levels to be the components of a Brownian motion, as in (j7]i, is but one possibility for a 
coupling that respects (|76]). 



For Z as defined in (71 1, the 1-MMSE relationship tells us that E[Z] = 0, for all such (X, F™) that are 



consistent with (76i. It is instructive to note that (76i is a requirement on the channel for the individual SNR levels. 
As mentioned, however, there are several ways in which we can couple the input X and outputs {i^J™} together 



so that they satisfy (76 1. The 1-MMSE relationship implies that for all such couplings we have E[Z] ~ 0. Before 



exploring some other examples of such 'couplings' and their properties, let us note a refinement of this zero-mean 
property pertaining to the random variable Z, which holds regardless of the coupling. 

Proposition 6: Suppose X has finite variance and that Z is defined as in Definition [T] under a joint distribution 



on {X,Yo^'''') satisfying (76i. Then 



E[Z\X] = 0, 



(77) 



Thus, not only is the tracking error a zero-mean random variable, but even its conditional expectation E[Z|X] is 
zero. We use the setting and results in |^ to establish this result. The 1-MMSE relationship which states that E[Z] 
= 0, is then immediately implied by Proposition [6] 

Having briefly touched upon the idea of ways other than the channel in (jTj), in which we can comply with the 



marginal channel requirements in (76i, let us look at some concrete examples and draw a comparison between 
them. 



B. Additive Standard Gaussian 



An alternative coupling between X and Yj that respects (76 1, is achieved by using a scaled standard Gaussian 



random variable as additive noise, instead of the Brownian motion considered in the previous setting. The channel 
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is described by letting, for 7 e [0, snr]. 



= + iV, (78) 

where N ^ A/'(0, 1) is independent of X. Note that in this coupHng, the channel has the same noise component 
for all values of 7. We now present the pointwise characterization of the 'tracking error' for this setting in the 
following Lemma. 



Lemma 11: Let X ^ Px, have a finite second moment. For the channel in (78 1, the pointwise tracking error Z 
defined in Definition [T] can be expressed as, 



Z^ c?7, 



where Z~^ is given by 



1{e[X'\Y,] - XE[X\Y,] -{X- E[X\Y,]r + ~ E[X\Y,])] 



(79) 



(80) 



As a sanity check, one can observe that £[^^,1^^] = and thus E[Z^] = 0. Consequently, E[Z] = E[Z-y] d-y = 0. 



Lemma 11 is closely related to the pointwise identity in 1 10 Theorem 2.3]. However, for completeness we present 



a stand-alone proof in Section IV 



Example 1: Applying Lemma 11 in the case where the channel input X is standard normal, i.e., X ~ A/^(0, 1), 
the tracking error is given by 



Z 



log(l + snr) - iV^ log(l + snr) + 2X N tan~'^ {^/mr) 



a.s. 



and thus, in particular, the variance of the tracking error is 



Var(Z) 



i(^log(l + snr)^ + (^tan~i(\/snr) 



(81) 



(82) 



For snr=l, we present a plot of the Cumulative Distribution Function of the random variable Z in (81 1 in Figure 



C. Independent Standard Gaussian 's 

We present yet another illustration of a different coupling which places the input and outputs of the Gaussian 



channel (76 1 on the same probabihty space. Unlike the previous two examples in Sections III-A and III-B 



re- 



spectively, we here look at the limiting behavior of a family of couplings achieved by the construction described 
below. 

Let A > 0. Define A = ff for M natural. Let li = {{i - 1) A, iA) for i e {1, 2, 3 . . . M}. Let Ni be independent 
standard Gaussian random variables ^ A/^(0, 1). Now we define the following process 



f^X + N^, 



(83) 



where — Ni for 7 € T^. Note that this is a couphng of the channel noise components at different SNR's that 
adheres to (|76ll. 
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Fig. 1: C.D.F. of Z under the couplings discussed in Subsections 



III-B 



and 



III-C 



for a Gaussian input 



We now evaluate I2 as defined in (70i for this process 



h = 



(84) 

(85) 
(86) 



We are interested in the limiting process when A is small. We consider the Riemann sum approximation of the 
above integi^al, 



M 



Sm = J2^X^E[X\Y,a]?A 



i=l 

snr 

M 



M 



J2iX^E[X\Y,A]f 



We now look at the conditional variance of I2 given X 



Var[/2|X] = — ^Var[(X- E[X|y,A])'|X], 



(87) 
(88) 

(89) 



i=l 



where Var[ |X] denotes the conditional variance averaged over the distribution of X. Thus, under mild regularity 
conditions on the underlying distribution of X (for instance, E[X''] < 00 suffices) we can see that the summand 
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in the r.h.s. is bounded above by a constant independent of M. Therefore the sum involves adding over M terms 
that are 0(1). In other words 



Var[/2|X] = probability, 

and thus 

l.i.m.M^oo/2 = E[/2|X]. 

Therefore, Z = Ii — I2, the tracking error is given (in the l.i.m. sense as M — > 00) by 

Z = h-h 

= h-£[h\X] 
= h-£[h\X] 



(90) 
(91) 

(92) 
(93) 
(94) 



where the last equality follows from equation (77 1. We now consider the case when X ^ J^(0, 1) which satisfies 



(91 1, and thus we can apply (94i to explicitly calculate Z. 



Note that Ii{X,Ys„^), as defined in Definition [T| depends only on the joint distribution of X and Fsnri and is 
therefore the same for all the channel couplings that are consistent with ( [76] l, for a fixed input distribution on X. 
Thus, for X ^ A/^(0, 1), Ii can be computed explicitly using Definition [T] to yield, 

Y2 



I, . ^log(l + snr) + i|— -(r-Vi5FX)^ 



and 



Let N = Y,„ 



snr 



E[/.m^ilog(l + snr) + l ^^^^^ 



iX'-l)\. 



(95) 



(96) 



snrX. Using (95 1 and (96i, we simplify (94i to get the following closed form expression for Z: 



1 

2(1 + snr) 



A^^snr + 2XA^x/snr + snr 



Further, the variance is given by. 



Var(Z) = snr-i±^ 
^ ' 2(l + snr)2 



(97) 



(98) 



For snr=l, we present a plot of the Cumulative Distribution Function of the random variable Z in (94 1 in Figure [T] 



D. Comparison of Variances 

Previously, in |III- A[ [III-B| and |III-C| we have considered different couplings that are consistent with ( [76] l and give 
rise to different pointwise relations between X and FJ'". In particular, for the specific channel input X ^ Af{Q, 1), 
we have explicit characterizations of the tracking error Z defined in Definition [T] We have also calculated the 



variance of this tracking error for each of these process evolutions, and they are given by ( 75 1, ( 82 1 and ( 98 1 
respectively. Here, we compare these couplings in terms of the variance of the tracking error for the Gaussian 
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Fig. 2: Variance of Tracking Error Z vs. snr for the cases in sections 



III-A 


III-B 


and 


III-C 



input. This comparison effectively tells us which particular relationship results in a better pointwise tracking of 
the information density and the actual squared error of the MMSE estimators. Fig. |2] shows a plot of the error 



variances. We observe that in this example of a Gaussian input, the coupling III-C results in the lowest variance of 



the tracking error, while that of III-B in the highest variance. We conjecture that to be the case in general, i.e., for 
any distribution of X with finite power 

Remark 2: Similar to our presentation of alternative couplings in the scalar estimation problem, we could 
introduce a time-snr coupling of inputs and outputs in the continuous time estimation model different from the 



Brownian sheet dependence in Section II-B4 The difference in behavior observed would be consistent with 
the differences we observe in the scalar scenario. However, a detailed analysis of the continuous time I-MMSE 
relationship for processes under alternative couplings is beyond the scope of this paper. 



E. An identity 

As a final result, we present an interesting identity between two random quantities. The nature and appUcability 
of such an identity needs to be explored further, but at the very least it shows us the kind of identities that can 
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easily be reaped from our pointwise framework. Let X ^ P. We consider two different channels as follows: 



= -fX + W-, 



(99) 
(100) 



where is a standard Brownian motion, independent of X. Thus, the coupling in (99 1 is the "Additive Standard 



Gaussian" one of Subsection III-B (with the role of N played by Wi), while that in ( 100 1 is our original Brownian 
motion coupling. We note that 



Y =Y 



and consequently. 



loe 



Yi\X 



dPv 



log- 



Yi\X 



dP^ 



a.s. 



(101) 



(102) 



However, we are now in a position to invoke results derived in Section [Tl-All to characterize the quantities on either 
side of (102 1. Using the definition of Z in Definition [T] and Proposition [T] we get respectively 



Zi 



log ■ 



dP. 



dP^ 



Yi 



= log ■ 



dP, 



Yi\X 



dP, 



Yi 



{X-E[X\Y.,]fdj 



{X - E[X\Y^]f dj = {X - E[X\Y^]) dW^ a.s. 



Combining the above we get 



log- 



dP 



Yi\X 



= log 



dPy, 
'^^YilX 



\j\x-E[X\Y^]fd^ 
\j\x-E[X\Y,]fd^ 



(103) 
(104) 

(105) 
(106) 



{X -E[X\Y]dW^ 



1 



1 



{X-E[X\Y^]Yd^-- {X-E[X\Y^]Yd^ a.s. (107) 



where all equalities are valid in the almost sure sense. Note that the relation in ( 107 1 is not only consistent with (and 
immediately implies) the I-MMSE relation, but also exhibits the pointwise relation between two different channel 



couplings. We can further plug in the expressions we have to characterize Zi (in subsection III-B i, to get equality 
relationships coupling the two channels considered. 

IV. Proofs 

Having discussed the main results in Section [llj we now present the proofs in detail. We begin by proving our 
main result for the "Pointwise Duncan" setting in Proposition |3] We then note that for the special case when the 
input is a DC process, the scalar estimation result in Proposition [T] follows directly from the continuous-time result. 

Proof of Proposition |5| We recall that the input-output mutual information is the expected value of the log 
Radon-Nikodym derivative of the measure induced by the process {F^} conditioned on X^ with respect to the 
measure induced by the process {1^"'^}. The expectation is with respect to the law P. 
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Define the causal estimate of Xt, 

Xt^£[Xt\Y^]. (108) 
We characterize the information density, by introducing the Radon-Nikodym derivative with respect to the standard 



Wiener measure 1 1 1 1 . Using the definition, and taking logarithm we get 



dPYT\xT _ / dPYT\XT I dPyT 



= log- 



dp 



YT\XT 



log 



dp. 



(109) 



(110) 



d/i dfi 

Let /i denote the standard Wiener measure on Y^. We apply the Girsanov theorem fl2\ to denote the Radon- 
Nikodym derivatives of the conditional and marginal laws of Y'^\X'^ and Y^ respectively with respect to /i, as 
follows: 



log- 



dP 



YT\XT 



log 



d/i 

dPyT 

d^i 



XtdYt^^J^ Xfdt 

XtdYt^l [ {Xtfdt, 
Jo 



where the equalities hold almost surely (a.s.)- Therefore, from \\ 10[ ), \\ 1 \) and (112i we have 



log 



dp. 



We shall now proceed to simphfy ( 113 i. Note from ( 19l , (snr=l) 



log 



dPYT\xi 
dPyT 



Xt-Xt) {Xt dt + dWt) --J [Xf- [XtY ) dt 



xf - XtXt - Ix? + 1 



Xt-Xt] dt 



{Xtf]dt 



Xt-Xt] dWt 



Xt-Xt] dWi 



On re-arranging, we get the desired result 

D{T) = [ {Xt - E[Xt\Y^]) dWt a.s. 
Jo 



(111) 
(112) 

(113) 

(114) 
(115) 
(116) 

(117) 



It is instructive to note that for the special case when Xt = X and T = snr. Proposition [3] reduces directly to 
the scalar estimation setting in Section II- A 1 thereby giving a direct proof of Proposition [T| We now note that 
Theorem [3] also follows directly from Proposition |3] by using a familiar identity for Ito integrals. 
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Proof of Theorem |5| 



Var{D{T)) 



{D{T)y 

{ r {Xt~E[Xt\Y'])dWtY 
Jo 

T 



I E[{Xt^E[Xt\Y']f]dt 
Jo 

cmmse(T) = 2I{X'^;Y'^) 



(118) 
(119) 

(120) 
(121) 



Here, (118i follows from the fact that E[Z3(T)] = 0, and (120i is a consequence of Ito's Isometry property 1 13 



Chapter 6]. Note also, that by definition, ( |120[ l is the squared filtering error, denoted by cmmse(T) in (20 1, which 
in turn is twice the mutual information by Duncan's theorem. ■ 
We again observe that for the special choices Xt = X and T = snr, we obtain the result for the scalar setting 
in Theorem [T] which tells us that the variance of the tracking error is equal to the minimum mean squared error 
integrated over SNR. Having established the pointwise results for the scalar and continuous time channels using 
Girsanov theory, we now proceed to prove the limit theorem that is a direct application of the pointwise treatment 
of Duncan's result. 

Proof of Theorem^ Note that Duncan's theorem tells us the equivalence of mutual information rate and half 
time-averaged causal squared error, i.e. 



^£^-E[iXt~E[X,\Y^]f]dt. 



(122) 



Under the channel model in (19i, we can change the order of expectation to equivalently state the above identity 
as: 



log 



dP. 



Y-' 







= E 





^-{Xt~E[Xt\Y^])'dt 



(123) 



Thus we observe the equality in expectation of the information density and half the squared filtering error. Using 
our pointwise characterization of Duncan's result, we will now show that not only are the random quantities in 
( |123[ ) equal in expectation, but in the limit of large T, their difference converges to in the mean square sense. Let 



us recall (22 1 and the result in ([24|l. On dividing both sides of (24i by T, we get the following: 



1 dPvT I vT 

log I""" 



T 



dPv 



f [ V^' ~ + [^^' - ^[^*l^o]) • dWt. (124) 



Let us take the limit as T — > oo. We claim that under very basic regularity conditions on Xt, the second term in 
the r.h.s. goes to in the mean square sense. 



Var(^ ^ {Xt - E[Xt\Y^]) ■ dWt) = ^Var( ^ [Xt - E[Xt\Y^,]) ■ dWt 



1 

2^2 



E[{Xt~E[Xt\Y']f]dt. 



(125) 
(126) 
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Thus, if the expression in (126i goes to in the limit T — > oo, i.e. it satisfies the condition in (27i, then from (124 1 
we get the desired result: 



l.i.m. 



T-i-c 



1 

■f 



log 



dPYT 

J n 



cLPyi 



{Xt-E.[Xt\Y^]Ydt 



0, 



(127) 



where l.i.m. denotes Limit in the Mean. ■ 

Having established the pointwise representation of Duncan's result and using it to extract results for scalar 
estimation, we now present the proof for continuous-time estimation with mismatch. We again note in this case, 
that the scalar estimation results follow as a special case of the continuous -time version of the result. 
Proof of Proposition |?] .• Let us denote the causal estimates of Xt under each law: 

Tif ^ Ep[Xt\Y'] (128) 

^f^EQ[Xt\Y% (129) 



We first note that the innovations processes specified below. 



and 



Vt=Yt 



TT, ds 



tt'^ ds 



(130) 



(131) 



are standard Brownian motions under P and Q. We now apply Girsanov's theorem (also discussed in Q Section 
IV.D]) to characterize the log Radon-Nikodym derivative of the observed process under laws P and Q respectively. 



log 
log 



dfi 

dQyT 

d^ 



2 7o 



nfdY,-- / [irfYdt 



TT? dY 







(^r) dt. 



(132) 



(133) 







We now use (132i and (133i to characterize the log Radon-Nikodym derivative of the measure induced by the 



output process Fq under P with respect to the measure induced under Q, 



log 



dPyT 
dQyT 



log 



dPyr/dfi 

dQyr/dfi 



{n^-7ri^)dY,^ 



1 



dt 



{nf -7rf)(Xtdt+ dWt) 



1 

2 Jo 

1 
2 



T ^ 



Q\-2 



dt 



(134) 

(135) 
(136) 



{nff + 2TifXt-2T:fXt + {^?fdt+ {^^-Tif)dWt (137) 



(TT? - X,f - (Trf - X,f 



dt+ / {ttC -7r?)dWt, 



(138) 



where ( |136| ) follows from ([T9|. 
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Note from (31 1 and (138 1 that, M{T) is a stochastic integral and is expressed as: 



M{T) 



/ (7rf - Tzf) dWt P - a.s. 
Jq 



(139) 



We now invoke Ito's Isometry property along with Proposition |4] to prove the second moment result in Theorem 

s 

Proof of Theorem |5j Using the isometry property of Ito integrals, we can compute the variance of M (T) as 
follows 



Var{M{T)) = Ef 



= cmsep^Q{T) — cmsep,p{T), 

= 2D{PyT\\QyT) 



dt 



(140) 

(141) 
(142) 
(143) 



where (141 1 follows from orthogonality property of estimators, and (143 1 follows from (30 1. ■ 
We now present the proofs of the pointwise results generalized to channels with feedback. The techniques are 
similar to the ones we use in order to prove the previous results for pointwise Duncan and mismatch. 

Proof of Theorem^ In proving the pointwise result for channels with feedback, we use the same idea that we 
employed in the pointwise treatment of Duncan's theorem. We use Girsanov theory to characterize the likelihood 
ratio of the conditional and marginal laws of y^, in terms of the filtering error in estimating (j)f Let ^ be the 
standard Wiener measure. Then, 

dPyT^X^ / dPyT 



log- 



dP. 



dPyT 



= log 



log- 



dP. 



yr^xT 



log 



dPyT 



(144) 



(145) 



dfi d/i 

Using the formula for likelihood ratios in the presence of white noise (cf. |[5),|flT[,p4)), we get the following 
almost-sure characterizations for the log Radon-Nikodym derivatives 



log 



dPyT 

dPyT 

djjL 



^dYt- 



b^dt 



log 



dYt 



itfdt, 



where 



^t = E[0,|yo*], 



(146) 
(147) 

(148) 



is the causal estimate of 4>t given the observations up until t. From equations ( 145 1, ( 146 1 and ( 147 1, we get that 



log 



dP 



yT 



1 



dt 



(149) 
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We recall that 

dYt = (l}tdt+ dWt, (150) 



and simplify ( 149 1 to get 

dPyTlxT 1 /"^ / - \ 2 



Recall that 



dPyTlxT Iff - \ 2 r / . \ 

dPyr "2/0 " '^V + X V'^* " '^V "'^^ ^^^^^ 



D,(T) ^ log^5^-J r(</>,-<^,fdi. (152) 



Combining with ( 151 1, we get the desired result, 

D^{T)= I {(^t-k)dWt a.s. (153) 



We now use Theorem |6] along with Ito's Isometry to directly yield the result in Corollary |7] 



(154) 



(a) 



T 



E[(0t-0t)Vi (155) 
cmmse0(T), (156) 







where (a) follows from the Isometry property of stochastic integrals. 

Note: Theorem |8] and Corollary |9] can be proved using the same techniques we used to establish the previous 
results for the non-mismatched case in Theorem |6j and Proposition |4] for the mismatched setting without feedback. 
Since the proofs follow directly, we omit them from the present discussion. 

We now explore the proof of the pointwise I-MMSE relationship for processes. We recall that in this setting, we 



have used a Brownian sheet process ( 52 1 to place the input and output processes on the same probability space on 
the time-snr plane. In this proof we use higher dimensional Girsanov theory to characterize the Radon-Nikodym 
derivative for vector processes. The proof is given below. 



Proof of Theorem 10 



We first present a result which can be established using Girsanov theory for higher dimensions for the continuous 
time Gaussian channel. 

Let X g be a real-valued random vector governed by the law fx- This acts as input to a Gaussian channel 
(at SNR = 1) to yield the output Yt as a function of observation time t e [0, T]. The input and output are related 
by the following equation: 

Yt=fX + Wf te[0,snr], (157) 

where Wj is a standard Brownian motion in M dimensions. In this setting we are interested in the "tracking error" 
between the log Radon-Nikodym derivative and integral of half the squared error, in the time interval [0, snr]. The 
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corresponding definitions for the vector case are listed below: 



h = 
h = 



log- 

1 
2 



Y,,„-|X 



|X-EfX|Y* 



■dt 



(158) 
(159) 



Lemma 12: Let Z — Ii — I2, where Ii and I2 are defined in ( |158[ )-( [T59l >. Then, 

/•snr 

/ (X-E[X|Y*). dMu a.s. 
Jo 



(160) 



Proof: The proof is very similar to that of Theorem [3] Let fi be the standard Wiener measure. We note that 



log- 



Y,„|X 



dPy 



log 



dB 



Y„,|X 



dfj. 



log , a.s. 



Define 



(161) 
(162) 

(163) 



X^ = E[X|Yo^], 

for 7 e [0,snr]. Now, using Girsanov theorem (cf. |[7j Section 3.5]) for higher dimensions, we can write the log 
Radon-Nikodym derivatives of the conditional and marginal distributions of Y, with respect to fi as follows. 



log 



dP 



Ys„,|X 



log 



dji 
dPY. 



dyL 



X • dY^ 



1 



2 7o 



X • X (i7 a.s. 
X-y • X^ (^7 a.s. 



Combining the above expressions with (162i and using (157i, we get 
which upon simplification reduces to. 



X • X — X^ • X^) ^7, 



In other words, 



h = I I ||X-X^||2d7+ / (X-X^)-dW.^ a.s. 
Z = h-h 

r 

(X — X-^) ■ dW^ a.s. 



(164) 
(165) 

(166) 

(167) 

(168) 
(169) 



Armed with this characterization in (I6O1, of the input-output information density in higher dimensions for the 



Gaussian channel in (52i, we now proceed to estabUsh a relationship akin to the I-MMSE for a specific class 
of processes. We begin by looking at piecewise constant scalar processes. The result we obtain (we shall argue), 
can be extended just as easily to the general class of finite power continuous-time processes using approximation 
arguments. 
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Let Xt be a continuous time process, and Yf be its noise corrupted version through the channel in (19 1 observed 
in the interval [0,T]. Fix M e N. We let Xt be a piecewise-constant process such that 



Xt = X, for all ^ < t < —T, 



(170) 



where Xi ^ P, and 1 < i < M. 

Fix snr > 0. From Q we know that the mutual information is the integral over SNR of the smoothing error, 

1" 



Note that. 



E 

where 



log 



d-P-r-T, (snr) 



= /(Xj;fo^^(^")) ^ /({XjfI,;{rJ}f£i) = /(X; Y,„,) = E 



T 

y W = ^ — X- + w^w 
7 M ' 



log ■ 



Y„„|X 



dPy 



(171) 



(172) 



(173) 



and 7 e [0, snr] denotes the signal to noise ratio. 

. The random vectors X e M^^ and e R^^ denote the collection of variables {X,}fi^ and {Kj*'}fi;^ 



respectively. 



• Wj' denotes the Brownian sheet process Wt'^'^ (that drives the channel noise in (|52[)), observed in the interval 
[^^^^p-T, jjT]. The M-dimensional random vector W-y = {W^^'jfii denotes an M-dimensional Brownian 
motion indexed by 7. 

Note that log '^o '™ depends only on the joint distribution between {Xi}f£-^ and {i^sm }f£i. Therefore it is the 



same as log ,p'°''^ , not only under expectation, but also pointwise a.s., i.e. 



log 



dPyT.,.n,^^r 
dP^T.(,n,) 



= log- 



dP 



(174) 



We now apply the identity in (160 1 directly to the log Radon-Nikodym derivative (quantity inside brackets in 
r.h.s. of ( |174| i) to get 



log 



dP 



Ys„,|X 



dPY,, 



1 

2 Jo 

2/ 



X-E[X|Y;7]||'d7+ / (X - E[X|Y^]) • dW, 



{Xt'^[Xt\Y;^;i]fdtd^+ / (X-E[X|Y^]). dW. 



- / sse(T,7) d7 + / ^(X, - E[X,|Y; 7]) dWi;- 
^ Jo Jo i=l 



(175) 
(176) 
(177) 



where sse(T,7) represents the squared smoothing error in an observation window of duration T at signal to noise 
ratio 7. Note that E[sse(T,7)] = mmse(7). 
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We can write the difference as, 

7V(T)^log— I ^ 

dPv>T.(s„r) 
n 



Note that by Ito's Isometry property, 

VariN{T)) 



M 



sse(T,7)d7 = / V(X,-E[X,|Y;7])dM/W 

.-1 "'0 



IX- EfXIY; 



(i7 



(li-E[X,|fo^;7])2c;t 



(i7 



r" (/) 

/ mmse(7) = 2/(snr), 

"'0 



(178) 
(179) 

(180) 
(181) 
(182) 



which is also an interesting result. Note that (f) follows from the continuous-time 1-MMSE relationship. We now 
argue that the result that we just established for piecewise constant processes cames through for general continuous 
time processes with finite average power We refer the reader to |4, Section IV.C] for details in making this 
approximation, and provide the sketch below. 



The main idea is to induce a stepwise process Xq"'''^ defined by 



X 



in) _ 



1 

2"r 



(i+l)2-"T 



Xt dt for t e (i2-"r, {i + 1)2-"T]. 



(183) 



i2-"T 



(i+l)2-"T 



Since processes in P have finite energy, the integral J^2-^t 
We now note that. 



Xtdt exists and is finite P-a.s. and is in L2{P). 



X. 



(n),T n-Kx) yT 







X^ in L\dtdP). 



(184) 



Further, the Radon-Nikodym derivates of the induced measures in the stepwise process converge to the actual 
Radon-Nikodym derivates in a Py^r-a.s. sense. Therefore, the approximation allows us to generalize our results to 
the class of all finite power continuous time processes. ■ 
Duncan's theorem proves as a corollary, the equivalence of the causal and anti-causal squared errors. We now 
present a proof for Proposition |5] where we establish a pointwise version of the result. 

Proof of Proposition |5j We first note the following relationship which follows directly from Theorem [3] 



log 



dPvT 2 



[Xt - E.[Xt\Y^]f dt + (Xt- E[Xt\Y^]) ■ dWt a.s. 



(185) 



Let us recall the transformations defined in (62 1 - (64 1. Note in addition, that E[Xf|yg*] is adapted to the filtration 
induced by {X'^ , B'^), so 

(186) 



(Xt - E[Xt\Y^]) ■ dBt 
is well defined in the standard sense of an Ito integral. 
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Applying the relation in ( 185 i, with the associations — > X"^, W'^ — > and Y'^ — > F^, gives us 



log ■ 



dP^T 



{Xt ~ E[Xt\Y^]y dt - 



{Xt - E[Xt\Y*]) ■ dBt a.s. 



(187) 



On the other hand, since there are one-to-one transformations to get from X to X and from Y to Y , we have 



log ^log a.s. 



J n 



dPyl 



(188) 



Combining (188i, (187i and (185 1, we get the following equality (in the almost sure sense): 

\j\x,-E[X,\Y^]fdt-\j^ 



{Xt^E.[Xt\Y^]Ydt 



{Xt-E[Xt\Y^])-dBt 







Simplifying the above expression further, we get 



{X, - £[X,\Y^]f dt-\j {Xt- E[Xt\Y^]r dt 



{Xt - E[Xt\Y^]) ■ dBt 



{Xt - E[Xt\Y^]) ■ dWt 
(189) 



(Xt - E[Xt\Y*]) ■ dWt 



E[Xt\Y^]dWt 



E[Xt\Y^])dBt 



where the second equality holds true for all processes that satisfy the following benign condition in (192i: 



XtdBt 



XtdWt a.s. 



(190) 
(191) 

(192) 



Note that (192i is simple to verify for the class of piecewise constant processes on the interval [0,r]. By approx- 



imation arguments similar to the ones given in the proof of Theorem 10 one can establish the equality ( 192 1 for 
all finite power continuous time processes. ■ 
Having established the pointwise I-MMSE relationship for continuous-time processes, we now present a pointwise 



version of the causal vs. non-causal error relationship discussed in Subsection II-B5 In doing so, for simplicity, 
we restrict our attention to the class of piecewise-constant processes. Our characterization is valid for all piecewise 
constant processes, and (appealing to the arguments given in the proof of Theorem [T0| therefore holds for all 
continuous time processes that can be approxmiated as such. 
Let Xj denote a piecewise-constant process, such that 



Xt = X, for all ^ < t < — T, 



(193) 



where ^ P, and 1 < i < M. We fix snr=l, and let the output be denoted by {i^/^''}o<t<T for the channel 
described in (|52|. I.e., 



dY}^'^ ^-fXtdt + dWt^'^\ 



(194) 



for 7 e [0,1] and t £ [0,T]. We first note from Proposition |3] that the input-output information density can be 
written as 



log 



dP^T.ii) 

-< n 



1 



{Xt-£[Xt\Y^ 



t,(i)n2 



dt 



{Xt-E[Xt\Y^^'^^\)- dWi" a.s. 



(1) 



(195) 
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Recall the definitions of X, and in the discussion following (173 1 in the proof of Theorem 10 Also, in 



establishing Theorem 10 we derive a relationship for the input-output information density from (174i and (176i, 
namely: 



log 



^ n 



(X - E[X|Y^]) • dW-y 



a.s. 



(196) 



We now combine ( 195 i and ( 196 1 to write down the pointwise characterization of the filtering and smoothing errors, 







{Xt - E[Xt\Y^'^'^]f dt - 
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{Xt-E[Xt\Y„'^'^^'^]fdtdj 



2(X- E[X|Y^]) • dW^ 



2{Xt ~ E[Xt\Y^-''^^] 



dWt^'^^ a.s. 



(197) 



/O JQ 

Thus, the difference between the filtering and smoothing errors has an explicit characterization in terms of stochastic 
integrals. It is instructive to note that taking expectation on both sides of ( |197| i establishes the identity in ( [SO] ), for 



snr=l, by using the fact that the Ito integrals in the r.h.s. of ( 197 1 have zero mean. 

We now present proofs of results stated in Section III In Proposition |6] we established that E[Z|X] = 0, for 
all underlying distributions of the signal X that have finite variance. The proof invokes the setting in ||3J and is 
presented below. 

Proof of Proposition^ We denote the mean squared error due to mismatch at signal to noise ratio 7 > 0, by 
msep_Q(7) defined in (jl4ji. Note from |[3j Section V.] that we have, 



I{X; VsnrX + N) 



J D {N + y/smx\\N + y/smX) dPx{x) 
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From \199) and \2Ql) it is clear that 
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(206) 
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Also, 



E[/.m.ilog(l + snr) + ^l^(X^-l) 



(207) 



for all underlying processes Px ■ Thus, not only are the random quantities above equal in expectation, but they are 
also equal in conditional expectation on X. ■ 



We now turn our attention to the alternative coupling discussed in Section III-B In Lemma 11 we derived an 



expression for the pointwise tracking error for the Additive Standard Gaussian coupling (78ifor a general input 



signal X. In the following, we present a proof of Lemma 11 



Proof of Lemma 11 



Let us assume X is distributed according to Px- Let fz{z) denote the probability density function of a standard 
Gaussian random variable. Note for the setting in ([78]l that. 



Py^\x{v\x) 



and 



Then, 



PY.iy) = j PY,\x{y\i)dPi. 

Py \xiY^\X) 

We now look at the differential form of the I-MMSE relationship. Namely, 

= -mmse(7), 



or its pointwise equivalent 



Define 



d Py \xiY^\X) 1 



= 0. 



Differentiating (210 1 w.rt. snr, we have 
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(217) 
(218) 
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where: 



(|215b <^ Py^ixiY-ylX) = fz{Y-y - ^X) = fz{N) (and hence does not depend on 7) 



(216 1 Yy = ^X + is the explicit dependence of on 7. Using this, and differentiating with respect 
to 7, we obtain the required expression 



(218 I <^ the integral is with respect to the conditional law with i, as the variable of integration, while 

keeping X and Ky constant. 



Combining ( |212| i, ( |218| l and ( |213| l, we get 

dh 1 
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--(x-E[x|y,])^ 

'XY^ - {^X + K,)E[X|y^] + V7E[x2|y^]] - i(X - E[X|K,])2 



i{E[x2|y,] - xE{x\Y,\ -{X- E[x|r,])2 + ^(^ - E[x|y,])}. 



Note now that Z according to Definition [T] is 



log 
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11" 



/"sni" 

/ {X-E\X\Y^fd^ 
Jo 



(219) 
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(221) 
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V. Conclusion 

We consider the scenario of mean square estimation of a signal observed through additive white Gaussian noise. 
We formulate classical information and estimation relationships in these contexts as expectation identities. We 
explicitly characterize the input-output information density for both scalar and continuous time Gaussian channels. 
Using this characterization, which relies on Girsanov theory, we obtain pointwise representations of these identities 
with the expectations removed and discover that these random quantities also have classical information-estimation 
links. In particular, canonical measures of information and estimation appear to be bridged by the second moment of 
the pointwise tracking error between the information density and the scaled filtering error. In this manner we present 
pointwise relations for Duncan's theorem, mismatched estimation, channels with feedback, the I-MMSE relationship 
as well as the causal vs. non-causal and causal vs. anticausal errors. A special treatment for scalar estimation is also 
provided where we present and discuss alternative couplings to the Brownian motion corrupted channel. We also 
provide applications of these results to obtain new and interesting relations in the information-estimation arena. 

The first and second moments of the tracking error in the Gaussian setting have direct implications on information 
and estimation relations. In future work, we would like to see whether similar implications emerge for higher order 
moments as well. In addition it would be interesting to investigate whether pointwise relationships similar to the 
Gaussian case, hold also for the Poissonian channel, where links between estimation and information have been 
recently uncovered in p3) for a natural loss function. 



31 



Acknowledgement 

The authors thank Rami Atar for valuable discussions. This work has been supported under a Stanford Graduate 
Fellowship, NSF grant CCF-0729195, and the Center for Science of Information (CSoI), an NSF Science and 
Technology Center, under grant agreement CCF-0939370. 

References 

[1] T. E. Duncan, "On the calculation of Mutual Information," SIAM J. Appl. Math., vol. 19, pp. 215-220, Jul. 1970. 

[2] D. Guo, S. Shamai and S. Verdu, "Mutual Information and minimum mean-square error in Gaussian channels", IEEE Trans. Information 

theory, vol. IT-51, no. 4, pp.1261-1283, Apr. 2005. 
[3] S. Verdii, "Mismatched Estimation and relative Entropy", IEEE Trans. Information theory, vol 56., no. 8, pp. 3712-3720, Aug. 2010. 
[4] T. Weissman, "The Relationship Between Causal and Noncausal Mismatched Estimation in Continuous-Time AWGN Channels", IEEE 

Trans. Information theory, vol. 56, no. 9, pp. 4256 - 4273, September 2010. 
[5] T.T. Kadota, M. Zakai, J. Ziv, "Mutual Information of the White. Gaussian Channel With and Without Feedback", IEEE Transactions on 

Information theory, vol. IT-17, no. 4, July 1971. 
[6] Y. Polyanskiy, H. V. Poor, S. Verdii, "New Chaimel Coding AchievabiUty Bounds", IEEE Int. Symposium on Information Theory 2008, 

Toronto, Ontario, Canada, July 6-11, 2008 
[7] 1. Karatzas and A. E. Shreve, Brownian Motion and Stochastic Calculus, 2nd ed. Springer- Verlag, New York, 1988. 
[8] T. Weissman, Y.-H. Kim, H. H. Permuter, "Directed Information, Causal Estimation, and Communication in Continuous Time", submitted 

to IEEE Transactions on Information theory, Sep. 2011. 
[9] M. Zakai, "On Mutual Information, Likelihood Ratios, and Estimation Error for the Additive Gaussian Channel", IEEE Transactions on 

Information theory, vol. 51, No. 9, September 2005. 
[10] D. Guo, "Gaussian Channels: Information, Estimation and Multiuser Detection," Ph.D. Thesis, Princeton University, 2004 
[11] T. Kailath,"The structure of Radon-Nykodim derivatives with respect to Wiener and related measures," Ann. Math. Statist., vol. 42, no. 3, 

pp. 1054-1067, 1971. 

[12] I. V. Girsanov, "On transforming a certain class of stochastic processes by absolutely continuous substitution of measures," Theory Probab. 

Appl, vol. 5, pp. 285 301, 1960. 
[13] J. M. Steele, Stochastic Calculus and Financial Applications, Springer, 2010. 

[14] T. Kailath, "A General Likelihood-Ratio Formula for Random Signals in Gaussian Noise", IEEE Transactions on Information theory, vol. 
IT-15, No. 3, May 1969. 

[15] R. Atar, T. Weissman, "Mutual Information, Relative Entropy, and Estimation in the Poisson Channel", IEEE Transactions on Information 
theory, vol. 58, no. 3, March 2012. 



